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Abstract 

We present a simple compiler, consisting of only 2000 lines of ML, 
for a strict, impure, monomorphic, and higher-order functional lan- 
guage. Although this language is minimal, our compiler generates 
as fast code as standard compilers like Objective Caml and GCC 
for several applications including ray tracing, written in the opti- 
mal style of each language implementation. Our primary purpose 
is education at undergraduate level to convince students — as well 
as average programmers — that functional languages are simple and 
efficient. 

Categories and Subject Descriptors D.3.4 [Programming Lan- 
guages]: Processors — Compilers; D.3.2 [Programming Languages]: 
Language Classifications — Applicative (functional) languages 

General Terms Languages, Design 

Keywords ML, Objective Caml, Education, Teaching 

1. Introduction 

The Meta Language, or ML, is a great programming language. It 
is one of the very few languages that achieve rather challenging 
and often conflicting demands — such as efficiency, simplicity, ex- 
pressivity, and safety — at the same time. ML is the only language 
ranked within the top three both for efficiency (runtime speed) and 
for simplicity (code lines) at an informal benchmark site [2] that 
compares various programming language implementations. ML is 
also the language most used by the winners of the ICFP program- 
ming contests [3], 

Unfortunately, however, it is also an undeniable fact that ML is 
a “Minor Language” in the sense that it is not as widespread as C 
or Perl, even though the situation is getting better thanks to mature 
implementations such as Objective Caml. 

Why is ML not so popular? The shortest answer is: because 
it is not well-known! However, looking more carefully into this 
obvious tautology, I find that one of the reasons (among others [24]) 
for this “negative spiral of social inertia” is misconceptions about 
implementations. Nowadays, there are a number of programmers 
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who learn ML, but they often think “I will not use it since I do 
not understand how it works.” Or, even worse, many of them make 
incorrect assumptions based on arbitrary misunderstanding about 
implementation methods. To give a few real examples: 

• “Higher-order functions can be implemented only by inter- 
preters” (reason: they do not know function closures). 

• “Garbage collection is possible only in byte code” (reason: they 
only know Java and its virtual machine). 

• “Functional programs consume memory because they cannot 
reuse variables, and therefore require garbage collection” (rea- 
son: ???). 

Obviously, these statements must be corrected, in particular when 
they are uttered from the mouths of our students. 

But how? It does not suffice to give short lessons like “higher- 
order functions are implemented by closures,” because they often 
lead to another myth such as “ML functions are inefficient because 
they are implemented by closures.” (In fact, thanks to known func- 
tion call optimization, ML functions are just as efficient as C func- 
tions if they can be written in C at all — except that function point- 
ers can sometimes be used in more efficient ways than function clo- 
sures.) In order to get rid of the essential prejudice that leads to such 
ill-informed utterances as above, we end up in giving a full course 
on how to implement an efficient compiler of a functional language. 
(Throughout this paper, an efficient compiler means a compiler that 
generates fast code, not a compiler which itself is fast.) To this goal, 
we need a simple but efficient compiler which can be understood 
even by undergraduate students or average programmers. 

The MinCaml Compiler was developed for this purpose. It is a 
compiler from a strict, impure, monomorphic, and higher-order 
functional language — which itself is also called MinCaml and 
whose syntax is a subset of Objective Caml — to SPARC Assembly. 
Although it is implemented in only 2000 lines of Objective Caml, 
its efficiency is comparable to that of OCamlOpt (the optimizing 
compiler of Objective Caml) or GCC for several applications writ- 
ten in the optimal style of each language implementation. 

Curricular Background. MinCaml has been used since year 
2001 in a third-year undergraduate course at the Department of 
Information Science in the University of Tokyo. The course is just 
called Compiler Experiments (in general, we do not call courses by 
numbers in Japan), where students are required to implement their 
own compiler of the MinCaml language from scratch 1 , given both 
high-level and medium-level descriptions in a natural language and 
mathematical pseudo-code (as in Section 4.3 and 4.4). Although 
the course schedule varies every year, a typical pattern looks like 
Table 1. 


1 The source code of MinCaml was not publicly available until March 2005. 



Week Topics 

1 Introduction, lexical analysis, parsing 

2 K-normalization 

3 a-conversion, /3-reduction, reduction of nested let- 
expressions 

4 Inline expansion, constant folding, elimination of un- 
necessary definitions 

5 Closure conversion, known function call optimization 

6 Virtual machine code generation 

7 Function calling conventions 

8 Register allocation 

9 Register spilling 

10 Assembly generation 

1 1 Tail call optimization, continuation passing style 

12 Type inference, floating-point number operations 

13 Garbage collection [no implementation required] 

14 Type-based analyses (case study: escape analysis) [no 
implementation required] 


Table 1 . Course Schedule 


M. N,e::= 
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op(Mi , . . . , M„) 

if M then Ni else N2 
let x = M in N 

x 

let rec x y\ ... y n — M and 

MJVi ... N n 

(Mi, . . . , M n ) 

let (xi , . . . , x„) == M in N 
Array. create M N 
Mr. (M2) 

Mi. (M2) <- Ms 


expressions 

constants 

arithmetic operations 
conditional branches 
variable definitions 
variables 
... in N 

function definitions 
function applications 
tuple creations 
reading from tuples 
array creations 
reading from arrays 
writing to arrays 


7T 

Tl — > . . . — * T n — > T 
Tl X ... X T n 

T array 
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types 

primitive types 
function types 
tuple types 
array types 
type variables 


Compiler Experiments is associated with another course named 
Processor Experiments, where groups of students design and im- 
plement their own CPUs by using programmable LSI called FPGA 
(field programmable gate arrays). Then, they develop compilers for 
those CPUs, execute ray tracing, and compete on the speed of ex- 
ecution. - The goal of these courses is to understand how computer 
hardware and software work without treating them as black boxes 
(which leads to misconceptions). 

Since students in Tokyo learn only liberal arts for the first year 
and half, these courses are in fact scheduled in the third semester 
of the information science major curriculum. By then, the students 
have learned Scheme, ML, and Prolog in addition to C/C++ and 
SPARC Assembly (during courses on operating systems and com- 
puter architecture) as well as Java (in the liberal arts courses). In 
particular, they have already learned how to write a simple inter- 
preter for a small subset of ML. 

Furthermore, they have already taken lectures on compilers of 
imperative languages (including standard algorithms for lexical 
analysis, parsing, and register allocation) for one semester. The pur- 
pose of our course is to teach efficient compilation of functional 
languages, rather than giving a general compiler course using func- 
tional languages. 

Design Policy. Given these situations, MinCaml was designed 
with three requirements in mind: (1) It must be understood in ev- 
ery detail by undergraduate students (through 14 hours of lectures 
and 42 hours of programming). (2) It must be able to execute at 
least one non-trivial application: ray tracing. (3) It must be as ef- 
ficient as standard compilers for this application and other typical 
small programs. Thus, it makes no sense to try to implement the full 
functionality of ML. To achieve our first goal, MinCaml only sup- 
ports a minimal subset of ML sufficient to meet the other goals. In 
particular, we have dropped polymorphism and data types as well 
as modules and garbage collection, though basic implementation 
techniques for these features are still covered in class. 

To make the compiler easier to understand, every design deci- 
sion is clearly motivated, as described in the following sections. 

Paper Overview. Section 2 presents the source language, Min- 
Caml, and Section 3 discusses the design of our compiler. Section 4 

2 This competition started in 1995 and its official record was held by the 
author’s group since they took the course in 1998 until the FPGA was 
upgraded in 2003. 


Figure 1 . Syntax of MinCaml 


elaborates on its details. Section 5 gives the results of our experi- 
ments to show the efficiency of the compiler. Section 6 compares 
our approach with related work and Section 7 concludes with future 
directions. 

The implementation and documentations of MinCaml are pub- 
licly available at http://min-caml.sf.net/index-e.html. 
Readers are invited (though not required) to consult them when 
the present paper refers to implementation details. 

2. The Language 

The source language of MinCaml is a minimal subset of Objective 
Caml, whose abstract syntax and types are given in Figure 1. 
This abstract syntax is designed for the following reasons. First 
of all, as any practical functional language does, we have basic 
values such as integers, floating-point numbers, booleans, tuples, 
and functions. Each of them requires at least one constructor (such 
as constants, function definitions, and tuple creations) and one 
destructor (such as arithmetic operations, conditional branches, 
function applications, and reading from tuples). 

Conditional branches must be a special form since we do not 
have more general data types and pattern matching. Tuples are de- 
structed using a simple form of pattern matching, instead of projec- 
tions like #i(M). This avoids the flex record problem: functions 
such as f(x ) = #i(x) do not have principal types in the standard 
type system of ML without record polymorphism [19, 22], 

Higher-order functions are supported since functions can be re- 
ferred to just as variables, and since nested function definitions 
with free variables are allowed. For simplicity, however, partial 
function applications are not automatically recognized by the com- 
piler and must be explicitly written by hand, for example like 
let rec fs y = f 3 y in /3 instead of just / 3 if / is defined 
to take two arguments. In this respect, our language is more similar 
to Scheme than to ML. 

Since our primary application is ray tracing, we also need arrays 
for efficient representation of vectors and matrices. Array construc- 
tion must be a special syntactic form because it is parametric in the 
element type. (Objective Caml can express this by using a polymor- 
phic library function, but MinCaml cannot.) Once we have arrays, 
reference cells can also be implemented as arrays with just one ele- 



ment. This implementation does not affect efficiency as we anyway 
have no boundary checks for array accesses. 

The types are standard monomorphic types except for n-ary 
function types, which reflect the lack of partial function applica- 
tions as mentioned above, and type variables, which will be used 
for type inference. 

These abstract syntax and types are literally implemented as ML 
data types Syntax. t and Type.t, except that bound variables in 
Syntax . t are annotated with elements of Type . t to keep the type 
information for later use. Also, for readability, a function definition 
is represented by a record with three fields name, args and body 
instead of their triple. 

Why Objective Caml? We have chosen Objective Caml as the 
meta language of MinCaml, as well as using its subset as the object 
language. This is just because Objective Caml is the only statically 
typed functional language that is taught well in our curriculum 
(and because static typing — in particular, exhaustiveness check of 
pattern matching — helps much when students write a compiler). 
Otherwise, either Standard ML or even Haskell would be fine 
as well (though laziness might affect efficiency and require more 
optimizations such as strictness analysis and linearity analysis). 

Why no data types and no pattern matching? As already stated 
above, we have omitted data types (including lists) and pattern 
matching. This may sound disappointing, as almost all real pro- 
grams in ML use them. However, we find that pattern matching is 
by far more complex than other features when compiling core ML. 
In addition, it is still possible to write interesting programs without 
using any data types (or patter matching), as shown in Section 5. 
Besides, the students are already busy enough in implementing 
other, more basic features such as higher-order functions. A pos- 
sible alternative would be to supply a pattern matcher implemented 
by the instructor, but we rejected this approach because the whole 
point of our course was to avoid such a “black box” in the compiler. 

Why no polymorphism? We have omitted polymorphism as well. 
Since we do not have data types at all (let alone polymorphic 
data types), there is much less use for polymorphic functions. In 
addition, polymorphic functions may affect even the efficiency 
of monomorphic functions (if implemented by boxing floating- 
point numbers, for example). On the other hand, it would not be 
too hard to implement polymorphic functions by code duplication 
(as in MLton [4]) without sacrificing efficiency. Polymorphic type 
inference would not be a big problem, either, because it is only a 
little more complex than the monomorphic version. 

In the actual course, we offer brief explanation of the two basic 
methods above (boxing and code duplication) of implementing 
polymorphism. Type inference with let-polymorphism is taught 
(and implemented in a simple interpreter) in a previous course on 
ML programming. 

Why no garbage collection? Once we have decided to drop data 
types, many of the interesting programs can be written in natural 
ways without allocating too many heap objects. As a result, they 
can run with no garbage collection at all. 

Of course, however, garbage collection is a fundamental feature 
of most modern programming languages. Thus, we offer a lecture 
on garbage collection for 4 hours (with no programming tasks), 
covering basic algorithms such as reference counting, copying GC, 
and mark-and-sweep GC as well as more advanced topics (without 
too much detail) including incremental GC, generational GC, and 
conservative GC. 

Why no array boundary checks? While it is easy to implement 
array boundary checks in MinCaml, we omitted them by default 
for fairer comparison with C (and OCamlOpt -unsafe) as in Sec- 


Module LoC 

Lexical analysis (in OCamlLex) 102 

Parsing (in OCamlYacc) 175 

Type inference 174 

K-normalization 195 

a-conversion 52 

/3-reduction 43 

Reduction of nested let-expressions 22 

Inline expansion 47 

Constant folding 50 

Elimination of unnecessary definitions 39 

Closure conversion 136 

Virtual machine code generation 208 

13-Bit immediate optimization 42 

Register allocation 262 

Assembly generation 256 


Table 2. Main Modules of The MinCaml Compiler 

tion 5. Optimizing away redundant checks would be much harder, 
as it requires the compiler to solve integer constraints in general. 

External functions and arrays. Unlike in ordinary ML, free vari- 
ables in MinCaml programs are automatically treated as external — 
either external functions or external arrays — so their declarations 
can be omitted. This is just for simplicity: since MinCaml is sim- 
ply typed, their types can easily be inferred from the programs [15]. 

3. The Compiler 

The main modules of The MinCaml Compiler are listed in Table 2, 
along with their lines of code. This section discusses major choices 
that we have made in their design and implementation. Further 
details about the internal structure of MinCaml are described in 
Section 4. 

Lexical analysis and parsing. Although syntax used to be a cen- 
tral issue in conventional compiler courses, we spend as little time 
as possible on it in our course: that is, we just give students our 
lexer and parser written in OCamlLex and OCamlYacc. The reason 
is that lexical analysis and parsing have already been taught in an- 
other compiler course (for imperative languages) in our curriculum. 
Possible alternatives to Lex and Yacc would be parser combinators 
or packrat parsing [12], but we did not adopt them as syntax is any- 
way out of scope. 

K-normalization. After parsing (and type inference), we use K- 
normal forms [7] as the central intermediate language in MinCaml. 
K-normal forms are useful as it makes intermediate computations 
and their results explicit, simplifying many optimizations including 
inline expansion and constant folding. 

We did not choose A-normal forms [11] because we did not 
need them: that is, A-normalizing all K-normal forms would 
help little in our compiler. On the contrary, requiring the inter- 
mediate code to be A-normal forms (i.e., forbidding nested let- 
expressions) complicates inline expansion. 

In addition, A-normalization in the strict sense [11] eliminates 
all conditional branches in non-tail positions by duplicating their 
evaluation contexts, which can cause code explosion if applied lit- 
erally. But if we allow conditional branches in non-tail positions, 
we lose the merit of A-normal forms that ei in let x = ei in ei 
is always an atomic expression (because ei may be a conditional 
branch if en then ei 2 else ei 3 , where ei 2 and 613 can them- 
selves be let-expressions). 

We did not choose CPS [5], either, for a similar reason: it does 
not allow conditional branches in non-tail positions and requires 



extra creation of the continuation closure (or inline expansion of 
it). 

Inline expansion. Our algorithm for inlining is rather simple: 
it expands all calls to functions whose size (number of syntactic 
nodes in K-normal form) is less than a constant threshold given by 
the user. Since this does not always terminate when repeated (e.g., 
consider let rec / x = f x in / 3), the number of iterations is 
also bounded by a user-given constant. Although this may seem too 
simple, it just works well for our programs — including recursive 
functions as well as loops implemented by tail recursion (achieving 
the effect of loop unrolling) — with reasonable increase of code size. 
By contrast, other inlining algorithms (see [20] for example) are 
much more complex than ours. Note that our inlining algorithm is 
implemented in only 47 lines, including the "size” function. 

Closure conversion. MinCaml supports higher-order functions 
by closure conversion. It optimizes calls to known functions with 
no free variables. Again, this known function optimization is sim- 
ple and effective enough for our purpose. Indeed, it optimizes all 
function calls in the critical parts of our benchmark applications. 

In addition to K-normal form expressions, we introduce three 
special constructs make_closure, apply_closure, and apply, 
direct (which means known function calls) in the intermediate 
language after closure conversion. This enables us to keep the sim- 
ple types without more advanced mechanisms such as existential 
types [17]. 

Register Allocation. The most sophisticated process in MinCaml 
(or perhaps in any modern compiler) is register allocation. Al- 
though many standard algorithms exist for imperative languages 
(e.g., [8, 21]), we find them unnecessarily complicated for Min- 
Caml because its variables are never destructively updated, obvi- 
ating the standard notion of “def-use chains” [18]. In addition, it 
is always better to spill a variable as early as possible, if at all. 
Thus, we have adopted a simpler greedy algorithm with backtrack- 
ing (for early spilling) and look-ahead (for register targeting [5]). 
We need not worry about coverage, because standard algorithms 
have already been taught in the other compiler course (for impera- 
tive languages). 

4. Inside The MinCaml Compiler 

The architecture of MinCaml adheres to the following principle. 
A compiler, by definition, is a program that transforms high-level 
programs to lower-level code. For example, the ML function 
let rec gcd m n = 

if m = 0 then n else 

if m <= n then gcd m (n - m) else 

gcd n (m - n) 

is compiled into the SPARC Assembly code 


gcd. 7: 

cmp 

7,i2, 0 


bne 

be_else . 18 


nop 



mov 

•/.i3, 7,i2 


retl 



nop 


be_else . 18: 

cmp 

•/„i2, '/„i3 


bg 

ble_else . 19 


nop 



sub 

•/.i3, '/„i2, '/.is 


b 

gcd. 7 


nop 


ble_else . 19 : 

sub 

7.12, 7 0 i3, °/ 0 o5 


mov 

•/.i3, '/„i2 


mov 

•/.o5 , 7,i3 


b 

gcd. 7 


nop 


which, at a first glance, looks totally different. The MinCaml Com- 
piler, like many other modern compiles, bridges this huge gap by 
defining appropriate intermediate languages and applying simple 
program transformations one by one. The major five gaps between 
MinCaml and SPARC Assembly are: 

1. Types. MinCaml has a type discipline; assembly does not. 

2. Nested expressions. MinCaml code is tree-structured with com- 
pound instructions; assembly code is a linear sequence of 
atomic instructions. 

3. Nested function definitions. In MinCaml, we can define a func- 
tion inside another function like 

let rec make.adder x = 

let rec adder y = x + y in 
adder in 
(make.adder 3) 7 

but assembly has only top-level “labels.” 

4. MinCaml has data structures such as tuples and arrays, while 
assembly does not. 

5. In MinCaml, we can use as many variables as we want, but 
only a limited number of registers are available in assembly 
(and therefore they sometimes must be spilled to memory). 

To bridge these gaps, MinCaml applies translations such as type 
inference, K-normalization, closure conversion, virtual machine 
code generation, register allocation (in this order). In what follows, 
we will explain the compilation processes of MinCaml including 
these translations and other optimizations. 

4.1 Lexical Analysis and Parsing (102 + 175 Lines) 


type Id.t (* variable names *) 

type ’a M.t (* finite maps from Id.t to ’a *) 

type S.t (* finite sets of Id.t *) 

type Type.t (* types *) 

type Syntax. t (* expressions *) 


The lexical analysis and parsing of MinCaml are implemented 
with standard tools, OCamlLex and OCamlYacc. As usual, they 
translate a string of characters to a sequence of tokens and then to 
an abstract syntax tree, which is necessary for any complex pro- 
gram manipulation. There is nothing special to be noted: indeed, in 
our course, the files lexer . mil and parser . mly are just given to 
students in order to avoid the overhead of learning the tools them- 
selves. The only non-trivial point — if any — is function arguments: 
to parse x-3 as integer subtraction rather than function applica- 
tion x(-3), the parser distinguishes “simple expressions” (expres- 
sions that can be function arguments with no extra parentheses) 
from other expressions like -3, just as Objective Caml does. 

4.2 Type Inference (174 Lines) 

val Typing. f : Syntax. t -> Syntax. t 
val Typing . extenv : Type.t M.t ref 

(* "private" function to 

destructively substitute type variables *) 
val Typing. g : Type.t M.t -> Syntax. t -> Type.t 


Since MinCaml is an implicitly typed language but our com- 
piler relies on the code being annotated with types, we first carry 
out a monomorphic version of Hindley-Milner type inference. It 
is also implemented in a standard way, representing type variables 
as Type . t option ref and substituting None with Some r during 
unification. The only deviation is our treatment of external vari- 
ables: when the type checker sees a free variable not found in the 



type environment, this variable is assumed as “external” and added 
to a special type environment Typing. extenv for external vari- 
ables. Thus, they need not be declared in advance: their types are 
inferred just as ordinary variables. This principal typing function- 
ality is peculiar to MinCaml and not applicable to full ML with 
let-polymorphism [ 15 ]. After the type inference, instantiated type 
variables (references to Some r) are replaced with their contents 
(type r). Any uninstantiated type variables is defaulted (arbitrarily) 
to int. 

4.3 K-Normalization (195 Lines) 

type KNormal.t (* K-normalized expressions *) 

val KNormal.f : Syntax. t -> KNormal.t 
val KNormal.f v : KNormal.t -> S.t 


We stated that compilation is about bridging the gaps between 
high-level programs and low-level code. One of the gaps is nested 
expressions: usually, a sequence of several instructions (like add, 
add, add, and sub) are needed to computer the value of a com- 
pound expression (like a+b+c-d). 

This gap is bridged by a translation called K-normalization , 
which defines every intermediate result of computation as a vari- 
able. For example, the previous expression can be translated to: 
let tmpl = a + b in 
let tmp2 = tmpl + c in 
tmp2 - d 

In general, the process of K-normalization can be described as 
follows. First, we define the abstract syntax of the intermediate 
language, K-normal forms, which is implemented as ML data type 
KNormal . t. 

M,N,e ::= c 

op(x l,...,X n ) 
if x = y then M else N 
i.f x < y then M else N 
let x = M in N 
x 

let rec x yi ... y n = M and ... in IV 
xyi ... y n 


The main point is that every target of basic operations — such as 
arithmetic operations, function applications, tuple creations, and 
reading from tuples — is now a variable, not a nested expression, 
because nested expressions are converted into sequences of let- 
expressions as in the example above. This conversion can be de- 
scribed by the following function 1 C. For every equation in this def- 
inition, all variables not appearing on the left-hand side are freshly 
generated. Although this function is straightforward, it is presented 
here for the purpose of showing how the mathematical pseudo-code 
given to students (and required to be implemented in Objective 
Caml, as mentioned in Section 1 ) looks like in general. 

JC(c ) = c 

JC(op(Mi, . . . , M„)) = 
let Xi = lC(Mi) in 

let x n = IC(M n ) in 

op(x i, . . . , Xn) 

IC( if Mi = M2 then Ni else N2 ) = 

let x = IC(Mi) in let y = JC(M2) in 
if x = y then IC(Ni) else 1 C(N 2 ) 


1 C{ if M\ M 2 then Ni else N2) = 

K .( if Ati = M2 then N2 else Ni ) 

M(if Mi < M2 then Ni else N2) = 

let x = K-(Mi) in let y = K,{Al2) in 
if x < y then IC(Ni) else K,{N2 ) 

)C(if Mi > M2 then Ni else N2 = 

K.( if M2 < Mi then Ni else N2) 

K,{ if Mi > M2 then Ni else N2) = 

K.( if Mi < M2 then N2 else iVi) 

K,{ if Mi < M2 then Ni else N2) = 

K.( if M2 < Mi then N2 else Ni) 

K,{ if M then Ni else N2) = 

K.( if A 1 = false then N2 else Ni) 

(if M is not a comparison) 

)C(let x — M in N) = 

let x = K.{M) in fC(N) 
tC(x) — x 

K.{ let rec f x 1 . . . x„ = M and ... in N) = 

let rec f x 1 ... x„ = K,{M) and ... in K.(N ) 
K.{M Ni ... N n ) = 
let x = IC(M) in 
let j/i = fC(Ni) in 

let y„ = JC(N„) in 
xyi ■■■ yn 

As apparent front the definitions above, we also translate con- 
ditional branches into two special forms combining comparisons 
and branches. This translation bridges another gap between Min- 
Caml and assembly where branch instructions must follow com- 
pare instructions. Although unrelated to K-normalization, it is im- 
plemented here to avoid introducing yet another intermediate lan- 
guage. 

In addition, as an optional optimization, our actual implementa- 
tion avoids inserting a let-expression if the term is already a vari- 
able. This small improvement is implemented in auxiliary function 
insert _let. 

let insert_let (e, t) k = 
match e with 
I Var(x) -> k x 

I - -> 

let x = Id.gentmp t in 
let e ’ , t ’ = k x in 
Let ( (x , t) , e , e ’ ) , t ’ 

It takes an expression e (with its type t) and a continuation k, 
generates a variable x if e is not already a variable, applies k to 
x to obtain the body e ’ (with its type t ’ ), inserts a let-expression 
to bind x to e, and returns it (with t’). The types are passed 
around just because they are necessary for type annotations of 
bound variables, and are not essential to K-normalization itself. 

This trick not only improves the result of K-normalization but 
also simplifies its implementation. (This would be yet another evi- 
dence that continuations are relevant to let-insertion [ 16 ] in gen- 
eral.) For example, the case for integer addition can be coded as 

(* in pattern matching over Syntax. t *) 

I Syntax . Add(el , e2) -> 
insert_let (g env el) 



(fun x -> insert_let (g env e2) 

(fun y -> Add(x, y), Type.Int)) 
and reading from arrays as: 

I Syntax . Get (el , e2) -> 

(match g env el with 
I (_> Type . Array (t) ) as g_el -> 
insert_let g_el 

(fun x -> insert_let (g env e2) 

(fun y -> Get(x, y) , t)) 

I _ -> assert false) 

The false assertion in the last line could be removed if K-normalization 
were fused with type inference, but we rejected this alternative in 
favor of modularity. 

4.4 o-Conversion (52 Lines) 


val Alpha. f : KNormal.t -> KNormal.t 

(* also public for reuse by Inline. g *) 

val Alpha. g : Id.t M.t -> KNormal.t -> KNormal.t 


Following K-normalization, MinCaml renames all bound vari- 
ables of a program to fresh names, which is necessary for the cor- 
rectness of transformations such as inlining. It can be specified by 
the following function a, where variables not appearing on the left- 
hand side are freshly generated and e(x) is defined to be x when x 
is not in the domain of e. 
q £ (c) = c 

a s (op(x i, . . . , x n )) = 
op(s(x r), . . . ,e(x n )) 
a £ (if x = y then Mi else M 2 ) = 

if e(x) = e{y) then a e (Mi) else a e (AT 2 ) 
a £ (if x < y then Mi else M 2 ) = 

if e(x) < s(y) then a e (Mi) else a e (A:f 2 ) 
a £ (let x = M in N) 

let x' = a s (M) in a StX ^ x t ( N ) 
a e {x ) = e(x) 

a £ (let rec f x 1 ... x m = Mi 
and g yt . . . y n = M 2 

in N) = 

let rec /' x'i ... x' m = (Mi) 

and . 9 ' y[ . . . y' n = a ejCT>yiMy / ,..., Vn » y ' n (M 2 ) 

in a e ,a(N) (where cr = / /', jh g 1 , . . .) 

a s (xyi ... y n ) = 
s(x)£(yi) ... s(y n ) 


It is implemented by a recursive function Alpha . g, which takes a 
(sub-)expression with a mapping e from old names to new names 
and returns an a-converted expression. If a variable is not found 
in the mapping, it is considered external and left unchanged. This 
behavior is implemented by auxiliary function Alpha . f ind, which 
is used everywhere in Alpha . g since variables are ubiquitous in K- 
normal forms. 

Naturally, as long as we are just a-converting a whole program, 
we only need to export the interface function Alpha. f which 
calls Alpha . g with an empty mapping. Nevertheless, the internal 
function Alpha, g is also exported because it is useful for inlining 
as explained later. 

4.5 /3-Reduction (43 Lines) 


val Beta.f : KNormal.t -> KNormal.t 


(* private *) 

val Beta.g : Id.t M.t -> KNormal.t -> KNormal.t 


It is often useful — both for clarify and for efficiency — to reduce 
expressions such as let x = y in x + y to y + y, expanding 
the aliasing of variables. We call the expansion (3-reduction of K- 
normal forms. (Of course, this name originates from /3-reduction in 
A-calculus, of which ours is a special case if let-expressions are 
represented by applications of A-abstractions, like (Ax. x + y)y for 
example.) It is not always necessary in ordinary programs, but is 
sometimes effective after other transformations. 

/3-reduction in MinCaml is implemented by function Beta.g, 
which takes an expression with a mapping from variables to equal 
variables and returns the /3-reduced expression. Specifically, when 
we see an expression of the form let x = ei in e 2 , we first 
/3-reduce e\. If the result is a variable y. we add the mapping 
from x to y and then continue by /3-reducing e 2 . Again, since 
variables appear everywhere in K-normal forms, auxiliary function 
Beta . f ind is defined and used for brevity (as in a-conversion) to 
substitute variables if and only if they are found in the mapping. 

4.6 Reduction of Nested let-Expressions (22 Lines) 

val Assoc. f : KNormal.t -> KNormal.t 


Next, in order to expose the values of nested let-expressions 
for subsequent transformations, we flatten nested let-expressions 
such as let x = (let y = ei in e 2 ) in e3 to let y = 
ei in let x = e 2 in e3. This “reduction” by itself does not affect 
the efficiency of programs compiled by MinCaml, but it helps other 
optimizations (e.g., constant folding of e 2 ) as well as simplifying 
the intermediate code. 

This transformation is implemented by function Assoc. f. 
Upon seeing an expression of the form let x = ei in e 2 , we 
first reduce ei to ej and e 2 to e 2 by recursion. Then, if ei is of the 
form let ... in e, we return the expression let ... in let x = 
e in e 2 . This verbal explanation may sound tricky but the actual 
implementation is simple: 

(* in pattern matching over KNormal.t *) 

I Let(xt, el, e2) -> 

let rec insert = function 
I Let(yt, e3, e4) -> 

Let(yt, e3, insert e4) 

I LetRec (f undef s , e) -> 

LetRec (fundef s , insert e) 

I LetTuple (yts , z, e) -> 

LetTuple(yts, z, insert e) 

I e -> Let(xt, e, f e2) in 
insert (f el) 

Indeed, assoc .ml consists of only 22 lines as noted above. 

4.7 Inline Expansion (47 Lines) 

val Inline .threshold : int ref 

val Inline. f : KNormal.t -> KNormal.t 

(* private *) 

val Inline. size : KNormal.t -> int 

val Inline. g : ((Id.t * Type.t) list * KNormal.t) M.t -> 
KNormal.t -> KNormal.t 


The next optimization is the most effective one: inline expan- 
sion. It replaces calls to small functions with their bodies. MinCaml 
implements it in module Inline as follows. 

Upon seeing a function definition let rec f x 1 ... x n = 
e in .... we compute the size of e by Inline . size. If this size 
is less than the value of integer reference Inline .threshold set 



by the user, we add the mapping from function name / to the pair 
of formal arguments and body e. Then, upon seeing 

a function call f yi ... y n , we look up the formal arguments 
Xi, . . . , x„ of / and its body e, and return e with xi, ... ,x n 
substituted by t/i, . . . , y n . 

However, since inlined expressions are copies of function bod- 
ies, their variables may be duplicated and therefore must be a- 
converted again. Fortunately, the previous process of substituting 
formal arguments with actual arguments can be carried out by 
Alpha . g together with a-conversion, just by using the correspon- 
dence from xi, . . . ,x„ to yi,. . . ,y n (instead of an empty map- 
ping) as the initial mapping. Thus, the inline expansion can be im- 
plemented just as 

(* pattern matching over KNormal.t *) 

I App(x, ys) when M.mem x env -> 
let (zs, e) = M.find x env in 
let env’ = 

List . f old_left2 

(fun env' (z, t) y -> M.add z y env’) 

M . empty zs ys in 
Alpha. g env’ e 

where M is a module for mappings. 

4.8 Constant Folding (50 Lines) 

val ConstFold.f : KNormal.t -> KNormal.t 
(* private *) 

val ConstFold.g : KNormal.t M.t -> KNormal.t -> KNormal.t 


Once functions are inlined, many operations have arguments 
whose values are already known, as x + y in let x = 3 in let y = 
7 in x + y. Constant folding carries out such operations at compile- 
time and replaces them with constants like 10. MinCaml imple- 
ments it in function ConstFold.g. It takes an expression with a 
mapping from variables to their definitions, and returns the expres- 
sion after constant folding. For example, given an integer addition 
x + y, it examines whether the definitions of x and y are integer 
constants. If so, it calculates the result and returns it right away. 
Conversely, given a variable definition let x = e in . . ., it adds 
the mapping from x to e. This is applied to floating-point numbers 
and tuples as well. 

4.9 Elimination of Unnecessary Definitions (39 Lines) 

val Elim.f : KNormal.t -> KNormal.t 
(* private *) 

val Elim. effect : KNormal.t -> bool 


After constant folding, we often find unused variable definitions 
(and unused function definitions) as in let x = 3 in let y = 
7 in 10. MinCaml removes them in module Elim. 

In general, if ei has no side effect and x does not appear free in 
e 2 , we can replace let x = ei in e 2 just with e 2 . The presence 
of side effects is checked by Elim. effect and the appearance of 
variables are examined by KNormal.fv. Since it is undecidable 
whether an expression has a real side effect, we treat any write to 
an array and any call to a function as side-effecting. 

Mutually recursive functions defined by a single let rec are 
eliminated only when none of the functions is used in the continu- 
ation. If any of the functions are used after the definition, then all 
of them are kept. 

4.10 Closure Conversion (136 Lines) 


type Id.l (* label names *) 

type Closure. t (* closure-converted expressions *) 


type Closure . fundef = 

{ name : Id.l * Type.t; 

args : (Id.t * Type.t) list; 
formal_fv : (Id.t * Type.t) list; 
body : Closure. t } 
type Closure. prog = 

Prog of Closure . fundef list * Closure. t 

val Closure. f : KNormal.t -> Closure. prog 
val Closure. fv : Closure. t -> S.t 

(* private *) 

val Closure . toplevel : Closure . fundef list ref 
val Closure. g : Type.t M.t (* typenv for fv *) -> 
S.t (* known functions *) -> 
KNormal.t -> Closure. t 


Another gap still remaining between MinCaml and assembly is 
nested function definitions, which are flattened by closure conver- 
sion. It is the second most complicated process in our compiler. 
(The first is register allocation, which is described later.) What fol- 
lows is how we explain closure conversion to students. 

The flattening of nested function definitions includes easy cases 
and hard cases. For example, 
let rec quad x = 

let rec dbl y = y + y in 
dbl (dbl x) in 
quad 123 

can be flattened like 

let rec dbl y = y + y ; ; 

let rec quad x = dbl (dbl x) ; ; 

quad 123 

just by moving the function definition. However, a similar manipu- 
lation would convert 

let rec make_adder x = 

let rec adder y = x + y in 
adder in 
(make_adder 3) 7 

into 

let rec adder y = x + y ; ; 
let rec make_adder x = adder ; ; 

(make_adder 3) 7 

which makes no sense at all. This is because the function dbl has 
no free variable while adder has a free variable x. 

Thus, in order to flatten function definitions with free variables, 
we have to treat not only the bodies of functions such as adder, but 
also the values of their free variables such as x together. In ML-like 
pseudo code, this treatment can be described as: 
let rec adder x y = x + y ; ; 
let rec make_adder x = (adder, x) ; ; 
let (f, fv) = make_adder 3 in 
f fv 7 

First, function adder takes the value of its free variable x as an 
argument. Then, when the function is returned as a value, its body 
is paired with the value of its free variable. This pair is called a 
function closure. In general, when a function is called, its body and 
the values of its free variables are extracted from the closure and 
supplied as arguments. 

The simple-minded approach of generating a closure for every 
function is too inefficient. Closure conversion gets more interest- 
ing when we try to separate the functions that require closures 
from those that can be called in more conventional ways. Thus, the 
closure conversion routine Closure. g of MinCaml takes the set 
known of functions that are statically known to have no free vari- 
ables (and therefore can be called directly), and converts a given 
expression by using this information. 



The results of closure conversion are represented in data type 
Closure . t that represents the following abstract syntax: 


P ::= 

D ::= 

e(yi,--.,ym)(zi,...,z n ) = N 
M, N, e ::= 
c 

Op(x 1 , . . . , Xn) 

if x = y then M else N 
if x < y then M else N 
let x = M in N 

x 

make_closure x = (£, (z 1 , . . . , 

apply_closure(a;, yi, . . . , y n ) 
apply_direct(£, j/i, . . . , y„) 


whole program 

top-level function definition 

constants 

arithmetic operations 
conditional branches 
conditional branches 
variable definitions 
variables 

:„)) and ... in M 
closure creation 
closure-based function call 
direct function call 


It is similar to KNormal.t, but includes closure creation make, 
closure and top-level functions Di,...,D„ instead of nested 
function definitions. In addition, instead of general function calls, it 
has closure-based function calls apply .closure and direct func- 
tion calls apply.direct that do not use closures. Furthermore, 
in the processes that follow, we distinguish the type of top-level 
function names (labels) from the type of ordinary variable names 
in order to avoid confusions. Note that apply.closure uses vari- 
ables while apply.direct uses labels. This is because closures are 
bound to variables (by make.closure) while top-level functions 
are called through labels. 

Upon seeing a general function call x y 1 ... t/„, Closure. g 
checks if the function x belongs to the set known. If so, it returns 
apply.direct. If not, it returns apply.closure. 

I KNormal . App(x, ys) when S.mem x known -> 

AppDir (Id.L(x) , ys) 

I KNormal . App(f , xs) -> 

AppCls (f, xs) 

Here, AppDir and AppCls are constructors in the Closure module 
that correspond to apply.direct and apply.closure, S is a 
module for sets, and Id . L is the constructor for labels. 

Function definitions let rec x yi ... y„ = ei in e 2 are 
processed as follows. First, we assume that the function x has no 
free variable, add it to known, and convert its body ei . Then, if x 
indeed has no free variable, we continue the process and convert 
ei. Otherwise, we rewind the values of known and toplevel (a 
reference cell holding top-level functions), and redo the conversion 
of e\. (This may take exponential time with respect to the depth of 
nested function definitions, which is small in practice.) Finally, if x 
never appears as a proper variable (rather than a top-level label) in 
ei, we omit the closure creation make.closure for function x. 

This last optimization needs some elaboration. Even if x has 
no free variable, it may still need a representation as a closure, 
provided that it is returned as a value (consider, for example, 
let rec x y = ... in a:). This is because a user who receives 
a: as a value does not know in general if it has a free variable or not, 
and therefore must anyway use apply.closure to call the func- 
tion through its closure. In this case, we do not eliminate make, 
closure since x appears as a variable in ei. However, if x is just 
called as a function, for example like let rec x y = . . . in a; 123, 
then we eliminate the closure creation for x because it appears only 
as a label (not a variable) in apply.direct. 

The closure conversion of mutually recursive functions is a lit- 
tle more complicated. In general, mutually recursive functions can 
share closures [5], but MinCaml does not implement this sharing. 
This simplifies the virtual machine code generation as discussed 


later. The drawback is that mutually recursive calls to functions 
with free variables get slower. However, we do not lose the effi- 
ciency of mutually recursive calls to functions with no free vari- 
ables, because they are anyway converted to apply.direct. 

4.11 Virtual Machine Code Generation (208 Lines) 

type SparcAsm.t (* instruction sequences *) 
type SparcAsm.exp (* atomic expressions *) 
type SparcAsm.fundef = 

{ name : Id.l; 

args : Id.t list; (* int arguments *) 
fargs : Id.t list; (* float arguments *) 
body : SparcAsm.t; 
ret : Type.t (* return type *)} 
type Spar cAsm. prog = 

Prog of (Id.l * float) list * (* float table *) 
SparcAsm.fundef list * 

SparcAsm.t 

val SparcAsm.fv : SparcAsm.t -> Id.t list (* use order *) 
val Virtual. f : Closure. prog -> SparcAsm.prog 

(* private *) 

val Virtual. data : (Id.l * float) list ref (* float table *) 
val Virtual. h : Closure . fundef -> SparcAsm.fundef 
val Virtual. g : Type.t M.t -> Closure. t -> SparcAsm.t 


After closure conversion, we generate SPARC Assembly. Since 
it is too hard to output real assembly, we first generate virtual ma- 
chine code similar to SPARC Assembly. Its main “virtual” aspects 
are: 


• Infinite number of variables (instead of finite number of regis- 
ters) 

• if-then-else expressions and function calls (instead of com- 
parisons, branches, and jumps) 

This virtual assembly is defined in module SparcAsm. The ML 
data type SparcAsm . exp almost corresponds to each instruction of 
SPARC (except If and Call). Instruction sequences SparcAsm.t 
are either Ans, which returns a value at the end of a function, or a 
variable definition Let. The other instructions Forget, Save, and 
Restore will be explained later. 

(* C(i) represents 13-bit immediates of SPARC *) 
type id.or.imm = V of Id.t I C of int 


type t = 

Ans of exp 

Let of (Id.t * Type.t) * exp 
Forget of Id.t * t 
and exp = (* excerpt *) 

Set of int 
SetL of Id.l 
Add of Id.t * id_or_imm 


t 


Ld of Id.t * id_or_imm 

St of Id.t * Id.t * id_or_imm 

FAddD of Id.t * Id.t 

LdDF of Id.t * id_or_imm 

StDF of Id.t * Id.t * id_or_imm 

IfEq of Id.t * id_or_imm * t * t 

IfFEq of Id.t * Id.t * t * t 

CallCls of Id.t * Id.t list * Id.t list 

CallDir of Id.l * Id.t list * Id.t list 

Save of Id.t * Id.t 


Restore of Id.t 


Virtual. f. Virtual. h, and Virtual. g are the three functions 
that translate closure-converted programs to virtual machine code. 
Virtual . f translates the whole program (the list of top-level func- 
tions and the expression of a main routine). Virtual. h translates 



each top-level function, and Virtual. g translates an expression. 
The point of these translations is to make explicit the memory ac- 
cesses for creating, reading from, and writing to closures, tuples, 
and arrays. Data structures such as closures, tuples, and arrays are 
allocated in the heap, whose address is remembered in special reg- 
ister Spar cAsm . reg_hp. 

For example, to read from an array, we shift its offset according 
to the size of the element to be loaded. 

I Closure . Get (x, y) -> 

let offset = Id.genid "o" in 
(match M.find x env with 
I Type. Array (Type. Unit) -> Ans(Nop) 

I Type. Array (Type. Float) -> 

Let((offset, Type.Int), SLL(y, C(3)), 
Ans(LdDF(x, V(offset)))) 

I Type. Array (_) -> 

Let((offset, Type.Int), SLL(y, C(2)), 
Ans(Ld(x, V(offset)))) 

I _ -> assert false) 

In tuple creation Closure .Tuple, each element is stored with 
floating-point numbers aligned (in 8 bytes), and the starting address 
is used as the tuple’s value. Closure creation Closure .MakeCls 
stores the address (label) of the function’s body with the values 
of its free variables — also taking care of alignment — and uses the 
starting address as the closure’s value. As mentioned in the previous 
section, this is easy because we generate separate closures with no 
sharing at all even for mutually recursive functions. Accordingly, 
at the beginning of each top-level function, we load the values of 
free variables from the closure, where every closure-based function 
application (AppCls) is assumed to set the closure’s address to 
register SparcAsm.reg_cl. 

In addition, since SPARC Assembly does not support floating- 
point immediates, we need to create a constant table in memory. For 
this purpose. Virtual . g records floating-point constants to global 
variable Virtual . data. 

4.12 13-Bit Immediate Optimization (42 Lines) 

val Simml3.f : Sparc Asm. prog -> Sparc Asm. prog 


In SPARC Assembly, most integer operations can take an imme- 
diate within 13 bits (no less than —4096 and less than 4096) as the 
second operand. An optimization using this feature is implemented 
in module Simml3. It is almost the same as constant folding and 
elimination of unnecessary definitions, except that the object lan- 
guage is virtual assembly and the constants are limited to 13-bit 
integers. 

4.13 Register Allocation (262 Lines) 

[Update on September 17, 2008: The register allocator now uses a 
simpler algorithm. It omits the backtracking (ToSpill and NoSpill) 
explained below.] 


val RegAlloc.f : SparcAsm.prog -> Spar cAsm. prog 

(* private *) 
type g_result - 

NoSpill of SparcAsm.t * Id.t M.t 
I ToSpill of SparcAsm.t * Id.t list 
val RegAlloc.h : SparcAsm.fundef -> SparcAsm.fundef 
val RegAlloc.g : Id.t * Type.t (* dest *) -> 
SparcAsm.t (* cont *) -> 

Id.t M.t (* regenv *) -> 

SparcAsm.t -> g_result 

val RegAlloc.g’ : Id.t * Type.t (* dest *) -> 
SparcAsm.t (* cont *) -> 

Id.t M.t (* regenv *) -> 


Spar cAsm. exp -> g_result 


The most complex process in The MinCaml Compiler is register 
allocation, which implements infinite number of variables by finite 
number of registers. As discussed in Section 3, our register alloca- 
tor adopts a greedy algorithm with backtracking for early spilling 
and look-ahead for register targeting. 

4.13.1 Basics 

First of all, as a function calling convention, we will assign argu- 
ments from the first register toward the last register. (Our compiler 
does not support too many arguments that do not fit in registers. 
They must be handled by programmers, for example by using tu- 
ples.) We set return values to the first register. These are processed 
in RegAlloc.h, which allocates registers in each top-level func- 
tion. 

After that, we allocate registers in function bodies and the main 
routine. RegAlloc . g takes an instruction sequence with a mapping 
regenv from variables to registers that represents the current reg- 
ister assignment, and returns the instruction sequence after register 
allocation. The basic policy of register allocation is to avoid regis- 
ters already assigned to live variables. The set of live variables are 
calculated by SparcAsm . f v. 

However, when allocating registers in the instruction sequence 
ei of let x = ei in e 2 , not only ei but also its “continuation” e 2 
must be taken into account for the calculation of live variables. For 
this reason, RegAlloc . g and RegAlloc . g’ , which allocates regis- 
ters in individual instructions, also take the continuation instruction 
sequence cont and use it in the calculation of live variables. 

4.13.2 Spilling 

We sometimes cannot allocate any register that is not live, since the 
number of variables is infinite while that of registers is not. In this 
case, we have to save the value of some register to memory. This 
process is called register spilling. Unlike in imperative languages, 
the value of a variable in functional languages does not change after 
its definition. Therefore, it is better to save the value of a variable 
as early as possible, if at all, in order to make the room. 

Whenever a variable x needs to be saved, RegAlloc.g returns 
a value ToSpill, and returns to the definition of x to insert a 
virtual instruction Save. In addition, since we want to remove x 
from the set of live variables at the point where x is spilled, we 
insert another virtual instruction Forget to exclude x from the set 
of free variables. For this purpose, value ToSpill carries not only 
the list xs of spilled variables, but also the instruction sequence e 
in which Forget has been inserted. After saving x, we redo the 
register allocation against e. 

Saving is necessary not only when registers are spilled, but 
also when functions are called. MinCaml adopts the caller-save 
convention, so every function call is assumed to destroy the values 
of all registers. Therefore, we need to save the values of all registers 
that are live at that point, as implemented in an auxiliary function 
RegAlloc .g’ _call. This is why ToSpill holds the list of spilled 
variables. 

When saving is unnecessary, we return the register-allocated 
instruction sequence e! (with the new regenv) in another value 
NoSpill. 

To put it altogether, the data type for the returned values of these 
functions is defined as follows: 

type g_result = 

NoSpill of 

SparcAsm.t (* instruction sequence 

with registers allocated *) 

* Id.t M.t (* new regenv *) 

I ToSpill of 



SparcAsm.t (* instruction sequence 

with Forget inserted *) 

* Id.t list (* spilled variables *) 

4.13.3 Unspilling 

A spilled variable will be used sooner or later, in which case 
RegAlloc.g’ (the function that allocates registers in individual 
instructions) raises an exception as it cannot find the variable 
in regenv. This exception is handled in an auxiliary function 
RegAlloc .g’ _and_unspill, where virtual instruction Restore 
is inserted to restore the value of the variable from memory to a 
register. 

However, this insertion of Restore pseudo-instructions breaks 
a fundamental property of our virtual assembly that every variable 
is assigned just one register. In particular, it leads to a discrepancy 
when two flows of a program join after conditional branches. For 
example, in the then-clause of expression (if /() then x — 
y else y — x) + x + y, variable x may be restored into register 
r o and y may be restored into n, while they may be restored in the 
other order in the else-clause. (A similar discrepancy also arises 
concerning whether a variable is spilled or not.) 

In imperative languages, such “discrepancies” are so common 
that a more sophisticated notion of def-use chains is introduced 
and used as the unit of register allocation (instead of individual 
variables). In MinCaml, fortunately, those cases are less common 
and can be treated in a simpler manner: whenever a variable is not 
in the same register after conditional branches, it is just assumed 
as spilled (and needs to be restored before being used again), as 
implemented in an auxiliary function RegAlloc . g’ _if . 

4.13.4 Targeting 

When allocating registers, we not only avoid live registers, but also 
try to reduce unnecessary moves in the future. This is called regis- 
ter targeting [5], itself an instance of register coalescing [18], For 
example, if a variable being defined will be the second argument 
of a function call, we try to allocate it on the second register. For 
another example, we try to allocate a variable on the first register 
if it will be returned as the result of a function. These are imple- 
mented in RegAlloc . target. For this purpose, RegAlloc . g and 
RegAlloc.g’ also takes register dest as an argument, where the 
result of computation will be stored. 

4.13.5 Summary 

All in all, the main functions in module RegAlloc can be described 
as follows. 

RegAlloc.g dest cont regenv e allocates registers in in- 
struction sequence e. It takes into account the continuation instruc- 
tion sequence cont when calculating live variables. Already allo- 
cated variables in e are substituted with registers according to the 
mapping regenv. The value computed by e is stored to dest. 

RegAlloc.g’ is similar to RegAlloc.g but takes individ- 
ual instructions (SparcAsm.exp) instead of instruction sequences 
(SparcAsm.t). However, it still returns instruction sequences — 
not individual instructions — so that spilling and unspilling can 
be inserted. It uses auxiliary functions RegAlloc ,g’ _call and 
RegAlloc .g’ _if to deal with spilling due to function calls and 
conditional branches, while unspilling is treated by another auxil- 
iary function RegAlloc . g ’ _and_unspill. 

All of the functions above return either NoSpill (e ’ , regenv2) 
or ToSpill (e , xs) . The former means that register allocation has 
succeeded: regenv2 is the new mapping from variables to regis- 
ters, and e’ is the instruction sequence where all variables have 
been substituted with the allocated registers. The latter means that 
register spilling is required: xs is the list of spilled variables, and 
e is the instruction sequence where Forget pseudo-instructions 


have been inserted. Both results must be treated by every caller of 
RegAlloc . g or RegAlloc . g ’ . 

Finally, RegAlloc. h takes a top-level function definition and 
allocates registers. RegAlloc .f takes a whole program and allo- 
cates registers. Actually, it is the only function exported by module 
RegAlloc. 

4.14 Assembly Generation (256 Lines) 

val Emit.f : ochan -> Spar cAsm. prog -> unit 
(* private *) 

type dest = Tail I NonTail of Id.t 
val Emit.h : ochan -> SparcAsm.fundef -> unit 
val Emit.g : ochan -> dest * SparcAsm.t -> unit 
val Emit.g’ : ochan -> dest * SparcAsm.exp -> unit 


At last, we reach the final phase: assembly generation. Having 
done most of the hard work (register allocation, in particular), it is 
easy to output SparcAsm.t as real SPARC Assembly by replac- 
ing virtual instructions with real ones. Conditional expressions are 
implemented by comparisons and branches. Save and Restore are 
implemented with stores and loads by calculating the set stackset 
of already saved variables (to avoid redundant saves) and the list 
stackmap of their locations in the stack. Function calls are a little 
trickier: Emit . shuffle is used to potentially re-arrange arguments 
in register order. 

(* given a list (xys) of parallel moves, 
implements it by sequential moves 
using a temporary register (tmp) *) 
let rec shuffle tmp xys = 

(* remove identical moves *) 
let _, xys = 

List .partition (fun (x, y) -> x = y) xys in 
(* find acyclic moves *) 
match (List .partition 

(fun (_, y) -> List .mem_assoc y xys) 
xys) with 

I □, □ -> □ 

I (x, y) :: xys, [] -> 

(* no acyclic moves; resolve a cyclic move *) 

(y, tmp) : : (x, y) : : 
shuffle tmp 
(List .map 
(function 

I (x’ , y’) when x’ = y -> (tmp, y’) 

I x y “> xy) 

xys) 

I xys, acyc -> acyc @ shuffle tmp xys 
Tail calls are detected and optimized in this module. For this 
purpose, function Emit . g (which generates assembly for instruc- 
tion sequences) as well as function Emit.g’ (which generates 
assembly for individual instructions) takes a value of data type 
Emit . dest that represents whether we are in a tail position: 
type dest = Tail I NonTail of Id.t 
If this value is Tail, we tail-call another function by a jump 
instruction, or set the result of computation to the first register and 
return by the ret instruction of SPARC. If it is NonTail (x), the 
result of computation is stored in x. 

4.15 Main Routine, Auxiliary Modules, and Runtime 
Library (45 + 228 + 197 Lines) 

After parsing command-line arguments, the main routine of Min- 
Caml applies all the processes above. It also repeats the five op- 
timizations from /^-reduction to elimination of unnecessary defi- 
nitions until their result reaches a fixed point (or the number of 
iterations reaches the maximum specified by a user). 



Finally, we provide a few auxiliary modules, write the runtime 
routine stub . c which allocates the heap and stack of MinCaml, 
implement external functions libmincaml . s in SPARC Assembly 
for I/O and math, and obtain The MinCaml Compiler. 

5. Efficiency 

The main point of MinCaml was to let students understand how 
functional programs can be compiled into efficient code. So we 
had to demonstrate the efficiency of the code generated by Min- 
Caml. For this purpose, we implemented several applications and 
compiled them with MinCaml, Objective Caml, and GCC. Each 
program was written in the optimal style of each language imple- 
mentation, so that the compiler produces as fast code as possible 
(to the best of our knowledge) without changing the essential al- 
gorithms. These comparisons are never meant to be "fair,” in the 
sense that MinCaml supports only a tiny language — in fact, it is 
intended to be minimal — while other compilers support real lan- 
guages. Rather, they must be understood as informal references. 

First, as small benchmarks, we chose three typical functional 
programs: Ackermann, Fibonacci, and Takeuchi (also known as 
Tak) functions. The first two of them test recursion on integers, 
and the last on floating-point numbers. The results are shown in 
Table 3. All the numbers are user-level execution times in seconds, 
measured by /usr/bin/time. 

The machine is Sun Fire V880 (4 Ultra SPARC III 1.2GHz, 
8GB main memory, Solaris 9). MinCaml is given the option 
-inline 100, meaning to inline functions whose size (the number 
of syntactic nodes in K-normal forms) is less than 100. OCamlOpt 
is version 3.08.3 and given the options -unsafe -inline 100. 
GCC -m32 and GCC -m64 are version 4.0.0 20050319 and given 
the option -03. GCC -m32 -mflat is version 3.4.3 (since more 
recent versions do not support -mflat) and given the same op- 
tion -03. Note that GCC4 (and, to a lesser degree, GCC3) often 
produces faster code than older versions such as GCC2. 

Although small benchmarks typically suffer front subtle effects 
of low-level mechanisms in a particular processor — such caches, 
alignments, and pipelines — our programs did not: indeed, looking 
at the assembly generated by each compiler, we found more obvi- 
ous reasons for our results: 

• Objective Caml and GCC3 do not inline recursive functions, 
while MinCaml and GCC4 do. 

• Objective Caml boxes — i.e., allocates in the heap — floating- 
point numbers passed as arguments (or returned as results) of 
functions in order to support polymorphism, though it does sup- 
port unboxed arrays (and records) of floating-point numbers. 

• GCC without -mflat (both -m32 and -m64) uses the register 
window mechanism of SPARC, which is almost always less ef- 
ficient than other function calling conventions because it saves 
(and restores) all registers including unused ones. 

• GCC with -mflat uses a callee-save convention instead of 
register windows, which is still suboptimal since it only saves 
registers in the prologues of functions (and restores them in 
their epilogues), not in the middle of them. 

• GCC4 reduces arithmetic expressions such as (n— 1) — 2, which 
appears after the inlining of Fibonacci, to n — 3. 

• GCC -m32 (with or without -mflat) passes floating-point 
number function arguments through integer registers, which 
incurs an overhead. 

Second, we tested larger applications: ray tracing, a harmonic 
function, the Mandelbrot set, and Huffman encoding. All of them 
are first written in C and then ported to ML. In Objective Caml, 
we adopted an imperative style with references and f or-statements 



Min- 

OCamlOpt 

GCC4 

GCC4 

GCC3 


Caml 

-unsafe 

-m3 2 

-m64 

-m32 

-mflat 

Ackermann 

0.3 

0.3 

1.3 

1.8 

1.0 

Fibonacci 

2.5 

3.9 

1.5 

1.4 

6.1 

Takeuchi 

1.6 

3.8 

3.7 

1.6 

5.5 

Ray Tracing 

3.4 

7.5 

2.3 

2.9 

2.6 

Harmonic 

2.6 

2.6 

2.0 

2.0 

2.0 

Mandelbrot 

1.8 

4.6 

1.7 

1.7 

1.5 

Huffman 

4.5 

6.6 

2.8 

3.0 

2.9 


Table 3. Execution Time of Benchmark Programs 


whenever it is faster than a function style. However, we always 
used tail recursion in MinCaml, since it does not have any other 
loop construct. The results are also in Table 3. Again, Objective 
Caml tends to be slower than other compiles because of boxing 
when floating-point numbers are used as arguments of functions or 
elements of tuples (which cannot be replaced with arrays because 
they contain other types of elements as well). MinCaml also tends 
to be a little slower than GCC because loops are implemented by 
tail recursive functions, and entering to (or leaving from) them 
requires extra saves (or restores) of variables not used within the 
loops. In addition, GCC implements instruction scheduling for 
floating-point operations in order to hide their latencies, while 
MinCaml does not. 

To summarize, for these modest benchmarks that can be written 
in our minimal language, the efficiency of MinCaml is comparable 
to major compilers such as Objective Caml and GCC with the speed 
ratio varying from "6 times faster” at best to “twice slower” at 
worst. 

6. Related Work 

There exist many compilers for ML and its variants: Comp.Lang.ML 
FAQ [1] gives a comprehensive list. However, I am not aware of 
any publicly available compiler that is as simple and efficient as 
MinCaml. There also exist various textbooks and tutorials on com- 
pilation of functional languages, but most of them present com- 
pilers into byte code or other medium-level languages — not native 
assembly — which do not satisfy our requirement for efficiency. 
The only exception that I am aware of is a well-known book by 
Appel [5], which uses CPS as the intermediate language and is 
distinct from MinCaml as argued below. 

Hilsdale et al. [14] presented a compiler for a subset of Scheme, 
implemented in Scheme, that generates native assembly. However, 
efficiency of the generated code is not discussed at all, perhaps 
because it was not a goal in their compiler. 

Sarkar et al. [23] developed a compiler course (using Scheme) 
based on the nanopass framework, where the compiler consists 
of many small translation (or verification) processes written in 
a domain specific language developed for this purpose. Unlike 
nanopass, we chose to use ordinary ML as the meta language in 
order to avoid the overhead of understanding such a domain specific 
language itself, and to utilize the type system of ML for statically 
checking the syntactic validity of intermediate code even before 
running the compiler. 

Feeley [10] presented a Scheme-to-C compiler which is sup- 
posed to be explained in “90 minutes” and implemented in less 
than 800 lines of Scheme. Its main focuses are on CPS conver- 
sion and closure conversion for first-class continuations and higher- 
order functions. Optimizations are out of scope: indeed, the com- 
piler is reported to produce 6 times slower code than Gambit-C 



does. By contrast, our compiler is a little more complex but much 
more efficient. 

Dijkstra and Swierstra [9] are developing a compiler for Haskell 
based on attribute grammar. It is presented as a sequence of imple- 
mentations with increasing complexities. So far, their main focus 
seems to be on typing. To the best of my knowledge, little code 
or no documentation is available for compilation at this moment. 
In addition, the most complex version of their compiler is already 
about 10.000 lines long, excluding an implementation of their do- 
main specific language based on attribute grammar. 3 

One [6] of Appel’s series of textbooks implements a compiler 
of an imperative language (called Tiger) in ML. This language is 
not primarily functional and is fundamentally different from ML. 
For instance, higher-order functions and type inference are only 
optional [6, Chapters 15 and 16]. With those options, the compiler 
is much more complex than ours. 

MinCaml adopts a variant of K-normal forms [7] as an inter- 
mediate language, which itself is a variant of A-normal forms [11], 
Another major intermediate language of functional language com- 
pilers is continuation passing style (CPS) [5]. The crucial difference 
between K-normal forms and CPS, which lead us to choose the for- 
mer, is conditional branches in non-tail positions: since all condi- 
tional branches must be in tail positions in CPS, non-tail branches 
are converted to tail branches with closure creations and func- 
tion applications, which incur overheads and require optimizations 
(such as the so-called “callee-save” registers or inter-procedural 
register allocation). 

On the other hand, however, CPS compiles function calls in 
a very elegant way without a priori assuming the notion of call 
stacks. Besides, K-normal forms have their own complication — 
which in essence stems from the same root — with non-tail branches 
(cf. the second and third paragraphs of Section 4.13.3) and, to 
a lesser degree, let-expressions (cf. the last paragraph of Sec- 
tion 4.13.1). Thus, it would also be interesting to see how simple 
and efficient compiler for education can be developed by using CPS 
instead of let-based intermediate languages. 

As we saw in Section 4.13, the most complex process in The 
MinCaml Compiler was register allocation. Although there exist 
more standard methods than ours such as graph coloring [8] and 
linear scan [21], we find them less clear (though much faster at 
compile-time) in the context of functional languages, in particular 
concerning where and how to insert spilling and unspilling. 

7. Conclusion 

We presented an educational compiler, written in 2000 lines of ML, 
for a minimal functional language. For several applications that can 
be written in this language, we showed that our compiler produces 
assembly code of comparable efficiency to Objective Carnl and 
GCC. 

The use of MinCaml in Tokyo has been successful. Most of the 
groups accomplished the implementation of compilers and ran ray 
tracing on their CPUs. Some students liked ML so much that they 
started a portal site (http : / /www . ocaml . j p/) and a mailing list 
as well as a translation of the manual of Objective Carnl, all in 
Japanese. 

Like many program transformations in functional languages, 
most processes in our compiler are implemented by tree traver- 
sal over abstract syntax and have many similarities to one an- 
other. For instance, functions KNormal.fv and Closure. fv are 
almost identical except for the necessary differences such as 
let rec and make_closure. This kind of similarities could per- 

3 Of course, line numbers are not always an exact measure of software 
complexity — in particular for different languages — but they often approxi- 
mate it with a certain precision. 


haps be exploited to simplify the compiler even more through 
subtyping (by means of polymorphic variants [13], for exam- 
ple) or generic programming in the style of Generic Haskell 
(http : / /www . gener ic-haskell . org/ ). 

Although our language was designed to be minimal, its ex- 
tensions would be useful for more advanced — maybe graduate — 
courses, and perhaps as a vehicle for research prototypes. Features 
required for these purposes include polymorphism, data types, pat- 
tern matching, garbage collection, and modules. We are looking 
into the tradeoff between simplicity and efficiency of various meth- 
ods for implementing them. 

We chose SPARC Assembly as our target code because of 
its simplicity and availability in Tokyo, but re-targeting to IA-32 
would also be interesting from the viewpoint of popularization 
in spite of the more complex instruction set architecture. We are 
also looking into this direction — in particular, how to adapt our 
code generator to 2-operand instructions (which are destructive by 
definition) in a “functional” way. 
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